Beyond instinct

Crafting well-being interventions with people analytics

Damiano D’Urso Ph.D., People Data Scientist

Rabobank, NL

Learning Outcomes

  • Understand what People Analytics (PA) is
  • Understand how PA can be useful for well-being interventions
  • Identify key steps and challenges of a PA project
  • Apply PA project lifecycle steps

Content

  • General Introduction
  • People Analytics
  • People Analytics project lifecycle
    • Case studies
  • Summary
  • Resources

General Introduction

About me

  • M.Sc. in Psychology @ University of Catania
  • Research internship in Psychological Methods @ University of Amsterdam
  • Ph.D. in Methodology and Statistics @ Tilburg University
  • People Data Scientist @ Rabobank

About Rabobank

People @ Rabobank

  • ~45k employees (~30k in the Netherlands)
  • Different chapters within HR (~850 employees)
  • People Data and Innovation (~40 members)
  • Diverse expertise and cultural backgrounds

What is People Analytics?

Data Science definition

Data science is a “concept to unify statistics, data analysis, informatics, and their related methods” to “understand and analyze actual phenomena” with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge (Wikipedia, 2023)

Data Science

In a Nutshell..

Data Science = Make Data Useful

In a Nutshell..

Data Science = Make Data Useful

People Analytics (or People Data Science)

  • “The analysis of employee and workforce data to reveal insights and provide recommendations to improve business outcomes” (Ferrer and Green, 2021)

  • “The organizational function within which data collection, analyses, and translation occur as well as a set of practices that draw on employee data to inform and aid decision-making processes and employee activity throughout the organization” (Polzer, 2022)

People Analytics

In a Nutshell..

People Analytics = Make People’s Data Useful

In a Nutshell..

People Analytics = Make People’s Data Useful

Why People Analytics

People Analytics provides tools, methodologies, and techniques to extract meaning out of employee data and make this data useful to:

  • Interpret of a large volume of data;
  • Identify trends and patterns in employee data;
  • Help to predict organizations’ and employees’ needs;
  • Prioritize HR activities based on impact utility and return on investment;
  • Reduce subjectivity and make decision-making transparent.

People Analytics Examples

Some interesting applications:

  • Employee retention
  • Enhancing employees’ well-being
  • Discovering occupational health risk factors
  • Reduce the pay gap among certain groups
  • Optimize Recruitment and Hire
  • Learning and Development
  • Increase diversity
  • …. etc

PA project lifecycle

flowchart 
  %%| fig-width: 10
    A(Business Problem Discovery) --> B(Data Selection)
    B --> C(Data Cleaning)
    C --> D(Data Analysis)
    D --> E(Interpretation and Storytelling)
    E --> F(Implementation and Feedback)
    F --> A

Business problem discovery

Business Problem Discovery

In this phase, there are a few primary goals:

  • Identify Problem
  • Define objectives and success metrics
  • Determine data sources and team

1) Identify problem

  • What is the problem?
  • Is it a problem? What’s the business value?
  • Who are the stakeholders involved?
  • What is the scope of the problem?
  • What is the problem time frame?

2) Define objectives

We normally try to answer these types of questions with People Analytics:

  • How much? (regression)
  • Which category? (classification)
  • Which group? (clustering)
  • Which option should be taken? (recommendation system)

2) Define success metrics

  • Specific
  • Measurable
  • Achievable
  • Relevant
  • Time-bound

3) Determine data sources and team

To ensure completeness of information and responsibilities, it is essential to:

  • Gather information about existing:
    • available data
    • reports and previous projects
    • documentation
  • Clarify roles and responsibilities

Example

The IT Department has been a critical part of the organization, driving innovation and ensuring systems run smoothly. However, we’ve noticed an alarming increase in stress-related sick leaves (~ 15%) among IT employees in the last year, which is higher than industry standards (7%). This not only affects their well-being but also disrupts our operations. To address this issue, we aim to develop a stress-risk classification system to identify high-stress cases early and implement targeted interventions. Our goal is to create a healthier work environment for our IT team.

Example

  • Identify problem:

    • Problem: Manage and reduce workplace stress in the IT Department
    • Problem size: Stress-related sick leaves have increased in the last year (+15%) and are higher than industry benchmarks. Also, there have been an increasing number of IT Teams with numerous sick leaves.
    • Stakeholders: direct (IT dept. employees); indirect (managers, HR)
    • Scope: IT Department employees
    • Time frame: 1 year
  • Define objectives: Classify employees into stress risk categories and tailor interventions.

  • Success metrics: Decrease the number of high-stress cases by 8% within one year from the application of data-driven interventions.

Exercise

Select one of the business cases available on CANVAS and a group of students to work with.

Throughout this lecture, you and your group will work as People Analytic experts and try to go through each of the PA lifecycle steps.

Exercise

For this step, discuss with others in your group how to define the following aspects based on the case description:

  • Identify problem
    • Problem
    • Problem size
    • Stakeholders
    • Scope
  • Define objective
  • Hypothesize Success Metrics

Data Acquisition

flowchart 
  %%| fig-width: 10
    A(Business Problem Discovery) --> B(Data Selection)

Data Acquisition

Data selection refers to collecting, retrieving, gathering, and sourcing data.

Data

  • Surveys
    • Pros: affordable, familiar, and (if well designed!!) very effective
    • Cons: bias-sensitive and could induce fatigue
  • Performance reviews and rating forms
    • Pros: efficient and (potentially) informative
    • Cons: bias-sensitive and (often) unreliable
  • Surveillance and monitoring
    • Pros: objective behavioral measures, and rich.
    • Cons: intrusive, costly to store
  • Organisational information
    • Pros: Cheap and easily collected
    • Cons: varying data quality
  • Text and Scraped data
    • Pros: Rich and new tools facilitate text extraction and analysis
    • Cons: Privacy sensitive and require heavy pre-processing

Summary data

Method Privacy Resource Objectivity Familiarity Complexity
Surveys + = = + =
Rating = = = = =
Monitoring = = + = +
DB Queries = = + = +
Scraping - + = = +

Legend: (+) High; (=) Middle; (-) Low

Exercise

Discuss, within your group, what type of data you would like to use for your project. The data can also be other than the types we just discussed. Elaborate on your choice and explain:

  • Advantages of using the (combination of) data you proposed
  • Disadvantages of using the (combination of) data you proposed

Data Preparation

flowchart 
  %%| fig-width: 10
    A(Business Problem Discovery) --> B(Data Selection)
    B(Data Selection) --> C(Data Preparation)

Data Quality issues

Selecting the data that we deem appropriate does not, in itself, guarantee good quality.

Data may be compromised before (or after) acquisition. Typical data quality issues are often due to the following:

  • Incompleteness: missing values or lacking certain attributes.
  • Noisiness: Recording errors or outliers;
  • Inconsistencies: Conflicting records or discrepancies;

Data pre-processing

To ensure good quality levels, it is essential to:

  • Conduct data health screens
  • Pre-process data

Data health screens

Data Health can be generally assessed by checking:

  • Record Count: Determine the total number of records in the dataset.

  • Variables Count: Identify the number of variables or features in the dataset.

  • Data Types: Matching between expected and actual attribute type (e.g., nominal)

  • Missing Values: Count and assess missing values within the dataset.

  • Consistency: Examine data records for inconsistencies, such as verifying that values fall within specified ranges (e.g., 18 < age < 80)

Data pre-processing

Data pre-processing and cleaning are essential steps in preparing data for analysis. Typical steps taken during data pre-processing and cleaning include:

  • Handling Missing Values: Imputing or removing missing values.

  • Duplicate Detection: Identify and remove duplicate records.

  • Data Transformation: Convert data into a suitable format (e.g., standardization).

  • Outlier Detection: Identify and handle outliers.

  • Data Encoding: Encode categorical variables into numerical values to make them usable for analysis.

  • Data Discretization: Divide continuous variables into bins or categories for analysis.

  • Data Aggregation: Aggregate data to a higher level (e.g., monthly or yearly) for trend analysis.

  • Text Data Processing: Tokenize and preprocess text data.

  • Data Validation: Validate data against predefined business rules or HR policies.

Remember!! Keep detailed records of data pre-processing steps for transparency and reproducibility.

Data screening example

Starting from inspecting and visualizing the data is a good way to assess its properties.

Employee_Name EmpID MarriedID MaritalStatusID GenderID EmpStatusID DeptID PerfScoreID FromDiversityJobFairID Salary Termd PositionID Position State Zip DOB Sex MaritalDesc CitizenDesc HispanicLatino RaceDesc DateofHire DateofTermination TermReason EmploymentStatus Department ManagerName ManagerID RecruitmentSource PerformanceScore EngagementSurvey EmpSatisfaction SpecialProjectsCount LastPerformanceReview_Date DaysLateLast30 Absences
Adinolfi, Wilson K 10026 0 0 1 1 5 4 0 62506 0 19 Production Technician I MA 1960 07/10/83 M Single US Citizen No White 7/5/2011 N/A-StillEmployed Active Production Michael Albert 22 LinkedIn Exceeds 4.60 5 0 1/17/2019 0 1
Ait Sidi, Karthikeyan 10084 1 1 1 5 3 3 0 104437 1 27 Sr. DBA MA 2148 05/05/75 M Married US Citizen No White 3/30/2015 6/16/2016 career change Voluntarily Terminated IT/IS Simon Roup 4 Indeed Fully Meets 4.96 3 6 2/24/2016 0 17
Akinkuolie, Sarah 10196 1 1 0 5 5 3 0 64955 1 20 Production Technician II MA 1810 09/19/88 F Married US Citizen No White 7/5/2011 9/24/2012 hours Voluntarily Terminated Production Kissy Sullivan 20 LinkedIn Fully Meets 3.02 3 0 5/15/2012 0 3
Alagbe,Trina 10088 1 1 0 1 5 3 0 64991 0 19 Production Technician I MA 1886 09/27/88 F Married US Citizen No White 1/7/2008 N/A-StillEmployed Active Production Elijiah Gray 16 Indeed Fully Meets 4.84 5 0 1/3/2019 0 15
Anderson, Carol 10069 0 2 0 5 5 3 0 50825 1 19 Production Technician I MA 2169 09/08/89 F Divorced US Citizen No White 7/11/2011 9/6/2016 return to school Voluntarily Terminated Production Webster Butler 39 Google Search Fully Meets 5.00 4 0 2/1/2016 0 2
Anderson, Linda 10002 0 0 0 1 5 4 0 57568 0 19 Production Technician I MA 1844 05/22/77 F Single US Citizen No White 1/9/2012 N/A-StillEmployed Active Production Amy Dunn 11 LinkedIn Exceeds 5.00 5 0 1/7/2019 0 15

Data screening example

Constructing descriptive statistics is also helpful to assess data properties, range, etc.

vars n mean sd median trimmed mad min max range skew kurtosis se
Employee_Name* 1 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
EmpID* 2 311 10156.000 89.922 10156.00 10156.000 115.643 10001.00 10311 310.00 0.000 -1.212 5.099
MarriedID* 3 311 0.399 0.490 0.00 0.373 0.000 0.00 1 1.00 0.412 -1.836 0.028
MaritalStatusID* 4 311 0.810 0.943 1.00 0.651 1.483 0.00 4 4.00 1.395 1.969 0.053
GenderID* 5 311 0.434 0.496 0.00 0.418 0.000 0.00 1 1.00 0.265 -1.936 0.028
EmpStatusID* 6 311 2.392 1.794 1.00 2.241 0.000 1.00 5 4.00 0.626 -1.494 0.102
DeptID* 7 311 4.611 1.083 5.00 4.723 0.000 1.00 6 5.00 -1.522 2.153 0.061
PerfScoreID* 8 311 2.977 0.587 3.00 3.024 0.000 1.00 4 3.00 -1.236 3.921 0.033
FromDiversityJobFairID* 9 311 0.093 0.291 0.00 0.000 0.000 0.00 1 1.00 2.784 5.770 0.017
Salary* 10 311 69020.685 25156.637 62810.00 64523.671 11834.113 45046.00 250000 204954.00 3.274 15.069 1426.502
Termd* 11 311 0.334 0.473 0.00 0.293 0.000 0.00 1 1.00 0.699 -1.517 0.027
PositionID* 12 311 16.846 6.223 19.00 17.647 1.483 1.00 30 29.00 -1.220 0.756 0.353
Position* 13 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
State* 14 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Zip* 15 311 6555.482 16908.397 2132.00 2170.173 340.998 1013.00 98052 97039.00 4.066 15.788 958.787
DOB* 16 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Sex* 17 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
MaritalDesc* 18 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
CitizenDesc* 19 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
HispanicLatino* 20 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
RaceDesc* 21 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
DateofHire* 22 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
DateofTermination* 23 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
TermReason* 24 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
EmploymentStatus* 25 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
Department* 26 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
ManagerName* 27 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
ManagerID* 28 303 14.571 8.078 15.00 14.251 5.930 1.00 39 38.00 0.752 1.532 0.464
RecruitmentSource* 29 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
PerformanceScore* 30 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
EngagementSurvey* 31 311 4.110 0.790 4.28 4.210 0.726 1.12 5 3.88 -1.106 1.100 0.045
EmpSatisfaction* 32 311 3.891 0.909 4.00 3.916 1.483 1.00 5 4.00 -0.220 -0.784 0.052
SpecialProjectsCount* 33 311 1.219 2.349 0.00 0.711 0.000 0.00 8 8.00 1.524 0.589 0.133
LastPerformanceReview_Date* 34 311 NaN NA NA NaN NA Inf -Inf -Inf NA NA NA
DaysLateLast30* 35 311 0.415 1.295 0.00 0.012 0.000 0.00 6 6.00 3.113 8.595 0.073
Absences* 36 311 10.238 5.853 10.00 10.177 7.413 1.00 20 19.00 0.029 -1.311 0.332

Data Visualization

Data Visualization can also help in understanding the data and getting quick insights

Exercise

Based on the data you decided to use in your project, discuss within your group:

  • Potential data quality issues
  • Data health screens you would conduct
  • Data pre-processing steps
  • Visualization/Statistics you would investigate

Data Analysis

flowchart 
  %%| fig-width: 10
    A(Business Problem Discovery) --> B(Data Selection)
    B(Data Selection) --> C(Data Preparation)
    C --> D(Data Analysis)

Data Analysis Goal

Analyze the (pre-processed) data and provide insights, and recommendations, or draw conclusions about the business problem.

Before deciding on what types of analyses would suit you best, it is essential to know the following:

  • What is the purpose of the analyses?
    • describe
    • understand
    • predict/classify
  • How interpretable should my model be?
  • Type of data
    • structured
    • unstructured
    • Mixed
    • none

What approach to choose?

Among some of the most commonly used tools in people analytics, we have:

  • Analytics: descriptive statistics
  • Inferential Statistics: hypothesis testing, causal modeling (e.g., SEM)
  • Machine Learning:
    • Supervised learning: models for labeled data (i.e., outcome or dependent variable is available)
    • Unsupervised Learning: models for unlabeled data (i.e., outcome or dependent variable is unavailable)
  • Computational modeling and simulations: scenario analysis
  • Natural Language Processing (NLP): text analysis

People Analytics maturity model

People Analytics maturity model (actually)

Descriptive Analytics

Predict-ish Analytics

Predictive Analytics

Prescriptive Analytics

Exercies

Discuss within your group what type of analysis you would conduct for your case study. Specifically, focus on the following aspects:

  • How will your data analysis plan contribute to the project’s success?
  • Does it match the objective you defined earlier?
  • What kind of metric would you use in your analyses (e.g., mean, p-value), and why?
  • In what PA maturity stage (e.g., predictive analytics) would your proposed analysis fall?

Interpretation and Storytelling

flowchart 
  %%| fig-width: 10
    A(Business Problem Discovery) --> B(Data Selection)
    B(Data Selection) --> C(Data Preparation)
    C --> D(Data Analysis)
    D --> E(Interpretation and Storytelling)

Interpretation and storytelling

  • The pyramid principle
  • Data Visualization

The Pyramid Principle

The Pyramid principle technique has been developed by McKinsey & Company and, successively published by Barbara Minto (Minto, 2008, 3rd edition).

This method helps in structuring thinking and convincing audience. The aim, ultimately, is to improve the impact of a project significantly.

The pyramid principle

The Pyramid Principle is a communication strategy that emphasizes presenting information in a structured and impactful way through a hierarchical structure:

  • Main Message: Concise statement that answers the critical question
  • Main Arguments: Several independent arguments that support the main message
  • Supporting Evidence: Back up your arguments with relevant evidence

This approach simplifies complex information, making it easier to understand and engage with the audience.

The Pyramid Principle

flowchart TD
    O[Business problem] --> P[Situation, Complication, Question]
    P --> E(Analyses)
    E --> A
    A[Main message] --> |why| B[Main Argument 1]
    A --> |why| C[Main Argument 2]
    A --> |how| D[Main Argument 3]

    B --> b1[ev. 1]
    B --> b2[ev. 2]
    B --> b3[ev. 3]

    C --> c1[ev. 4]
    C --> c2[ev. 5]
    C --> C3[ev. 6]

    D --> d1[ev. 7]
    D --> d2[ev. 8]
    D --> d3[ev. 9]

A Well-being example

flowchart TD
    O[Business problem] --> P[Reduce sick leave]
    P --> E(Analyses)
    E --> A
    A[Promote Wellbeing] --> |why| B[Improve Employee health]
    A --> |why| C[Enhances work-life balance]
    A --> |how| D[Foster Positive work environment]

    B --> b1[ev. 1]
    B --> b2[ev. 2]
    B --> b3[ev. 3]

    C --> c1[ev. 4]
    C --> c2[ev. 5]
    C --> C3[ev. 6]

    D --> d1[ev. 7]
    D --> d2[ev. 8]
    D --> d3[ev. 9]

Data Visualization principles

  • Define your message
  • Understand analyses behind the message
  • Pick a suitable graph
  • Check the graph for clarity
    • formulate message
    • less is more
    • guide attention

Data Visualization charts

Implementation and feedback

flowchart 
  %%| fig-width: 10
    A(Business Problem Discovery) --> B(Data Selection)
    B(Data Selection) --> C(Data Preparation)
    C --> D(Data Analysis)
    D --> E(Interpretation and Storytelling)
    E --> F(Implementation and Feedback)
    F --> A

Implementation and feedback

  • Data and Results are only the starting point of a conversation
  • Align with stakeholders
  • Change management
  • Ethical and legal considerations

Summary

flowchart 
  %%| fig-width: 10
    A(Business Problem Discovery) --> B(Data Selection)
    B(Data Selection) --> C(Data Preparation)
    C --> D(Data Analysis)
    D --> E(Interpretation and Storytelling)
    E --> F(Implementation and Feedback)
    F --> A

Questions?

Resources